D-Score: Holistic Dialogue Evaluation Without Reference

نویسندگان

چکیده

In artistic gymnastics, difficulty score or D-score is used for judging performance. Starting from zero, an athlete earns points different aspects such as composition requirement, difficulty, and connection between moves. The final a of the quality various performance indicators. Similarly, when evaluating dialogue responses, human judges generally follow number criteria, among which language fluency, context coherence, logical consistency, semantic appropriateness are on top agenda. this paper, we propose automatic evaluation framework called that resembles way gymnastics evaluated. Following four criteria above, devise range tasks model them under multi-task learning framework. proposed framework, without relying any human-written reference, learns to appreciate overall human-human conversations through representation shared by all over-fitting individual task domain. We evaluate performing comprehensive correlation analyses with judgement three datasets, two past DSTC series, benchmark against state-of-the-art baselines. not only outperforms best baseline large margin in terms system-level Spearman but also represents important step towards explainable scoring.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Back-translation Score: Automatic MT Evaluation at the Sentence Level without Reference Translations

Automatic tools for machine translation (MT) evaluation such as BLEU are well established, but have the drawbacks that they do not perform well at the sentence level and that they presuppose manually translated reference texts. Assuming that the MT system to be evaluated can deal with both directions of a language pair, in this research we suggest to conduct automatic MT evaluation by determini...

متن کامل

Semantics, Dialogue, and Reference Resolution

Most pronoun resolution research has focused on written corpora while using syntactical and surface cues. Though big gains have been made in this domain with those methods, it is difficult to do better than the 80% coverage in these domains without some world or semantic knowledge. We investigate this issue by incorporating rich semantic information into a proven reference resolution model over...

متن کامل

Sentence-level MT evaluation without reference translations: Beyond language modeling

In this paper we investigate the possibility of evaluating MT quality and fluency at the sentence level in the absence of reference translations. We measure the correlation between automatically-generated scores and human judgments, and we evaluate the performance of our system when used as a classifier for identifying highly dysfluent and illformed sentences. We show that we can substantially ...

متن کامل

A Matlab-Based Tool for Video Quality Evaluation without Reference

This paper deals with the design of a Matlab based tool for measuring video quality with no use of a reference sequence. The main goals are described and the tool and its features are shown. The paper begins with a description of the existing pixel-based no-reference quality metrics. Then, a novel algorithm for simple PSNR estimation of H.264/AVC coded videos is presented as an alternative. The...

متن کامل

XMEANT: Better semantic MT evaluation without reference translations

We introduce XMEANT—a new cross-lingual version of the semantic frame based MT evaluation metric MEANT—which can correlate even more closely with human adequacy judgments than monolingual MEANT and eliminates the need for expensive human references. Previous work established that MEANT reflects translation adequacy with state-of-the-art accuracy, and optimizing MT systems against MEANT robustly...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2021

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2021.3074012